13 research outputs found
Taming Horizontal Instability in Merge Trees: On the Computation of a Comprehensive Deformation-based Edit Distance
Comparative analysis of scalar fields in scientific visualization often
involves distance functions on topological abstractions. This paper focuses on
the merge tree abstraction (representing the nesting of sub- or superlevel
sets) and proposes the application of the unconstrained deformation-based edit
distance. Previous approaches on merge trees often suffer from instability:
small perturbations in the data can lead to large distances of the
abstractions. While some existing methods can handle so-called vertical
instability, the unconstrained deformation-based edit distance addresses both
vertical and horizontal instabilities, also called saddle swaps. We establish
the computational complexity as NP-complete, and provide an integer linear
program formulation for computation. Experimental results on the TOSCA shape
matching ensemble provide evidence for the stability of the proposed distance.
We thereby showcase the potential of handling saddle swaps for comparison of
scalar fields through merge trees
Comparative Design-Choice Analysis of Color Refinement Algorithms Beyond the Worst Case
Color refinement is a crucial subroutine in symmetry detection in theory as well as practice. It has further applications in machine learning and in computational problems from linear algebra.
While tight lower bounds for the worst case complexity are known [Berkholz, Bonsma, Grohe, ESA2013] no comparative analysis of design choices for color refinement algorithms is available.
We devise two models within which we can compare color refinement algorithms using formal methods, an online model and an approximation model. We use these to show that no online algorithm is competitive beyond a logarithmic factor and no algorithm can approximate the optimal color refinement splitting scheme beyond a logarithmic factor.
We also directly compare strategies used in practice showing that, on some graphs, queue based strategies outperform stack based ones by a logarithmic factor and vice versa. Similar results hold for strategies based on priority queues
MixedTrails: Bayesian hypothesis comparison on heterogeneous sequential data
Sequential traces of user data are frequently observed online and offline,
e.g., as sequences of visited websites or as sequences of locations captured by
GPS. However, understanding factors explaining the production of sequence data
is a challenging task, especially since the data generation is often not
homogeneous. For example, navigation behavior might change in different phases
of browsing a website, or movement behavior may vary between groups of users.
In this work, we tackle this task and propose MixedTrails, a Bayesian approach
for comparing the plausibility of hypotheses regarding the generative processes
of heterogeneous sequence data. Each hypothesis is derived from existing
literature, theory or intuition and represents a belief about transition
probabilities between a set of states that can vary between groups of observed
transitions. For example, when trying to understand human movement in a city
and given some observed data, a hypothesis assuming tourists to be more likely
to move towards points of interests than locals, can be shown to be more
plausible than a hypothesis assuming the opposite. Our approach incorporates
such hypotheses as Bayesian priors in a generative mixed transition Markov
chain model, and compares their plausibility utilizing Bayes factors. We
discuss analytical and approximate inference methods for calculating the
marginal likelihoods for Bayes factors, give guidance on interpreting the
results, and illustrate our approach with several experiments on synthetic and
empirical data from Wikipedia and Flickr. Thus, this work enables a novel kind
of analysis for studying sequential data in many application areas.Comment: Published in Data Mining and Knowledge Discovery (2017) and presented
at ECML PKDD 201
Whole-genome sequencing identifies rare genotypes in COMP and CHADL associated with high risk of hip osteoarthritis.
To access publisher's full text version of this article click on the hyperlink belowWe performed a genome-wide association study of total hip replacements, based on variants identified through whole-genome sequencing, which included 4,657 Icelandic patients and 207,514 population controls. We discovered two rare signals that strongly associate with osteoarthritis total hip replacement: a missense variant, c.1141G>C (p.Asp369His), in the COMP gene (allelic frequency = 0.026%, P = 4.0 Ă— 10(-12), odds ratio (OR) = 16.7) and a frameshift mutation, rs532464664 (p.Val330Glyfs*106), in the CHADL gene that associates through a recessive mode of inheritance (homozygote frequency = 0.15%, P = 4.5 Ă— 10(-18), OR = 7.71). On average, c.1141G>C heterozygotes and individuals homozygous for rs532464664 had their hip replacement operation 13.5 years and 4.9 years earlier than others (P = 0.0020 and P = 0.0026), respectively. We show that the full-length CHADL transcript is expressed in cartilage. Furthermore, the premature stop codon introduced by the CHADL frameshift mutation results in nonsense-mediated decay of the mutant transcripts